Power Analysis

Jasper Slingsby

Power Analysis


No one ever does them…


…but they could save so much pain and suffering if they did!!!

Power Analysis


Statistical power is the probability of a hypothesis test finding an effect if there is an effect to be found.


Power analysis is a calculation typically used to estimate the smallest sample size needed for an experiment, given a required significance level, statistical power, and effect size.

  • It is normally conducted before the data collection!

Why do power analysis?


Firstly, it helps you plan your analyses before you’ve done your data collection, which is always useful.


Secondly, not knowing the statistical power of your analysis can result in

  • missed findings (through Type II Error), or
  • false findings (through Type I Error).

Why do power analysis?


Type II Error:

  • occurs when the researcher erroneously concludes that there is not a difference between treatments, when in reality there is…
  • this is a common outcome of low statistical power

Why do power analysis?


Type I Error:

  • occurs when the researcher erroneously concludes that there is a difference between treatments, when in reality there is not…
  • less likely when there is poor statistical power, but can happen with low sample sizes of highly variable subjects, or if there is bias in sampling…

Why do power analysis?

Type I and Type II Errors and how they result in false or missing findings, respectively. Image from Norton and Strube 2001, JOSPT.

Statistical Power


Is determined by the combination of the:

  • \(\alpha\) (“significance”) level required (e.g. P < 0.05)
  • difference between group means (effect size)
  • variability among subjects
  • sample size (the factor we usually have most control over)

\(\alpha\) (“significance”) level

We usually use an \(\alpha\) of 0.05 to indicate significant difference.

  • i.e. the probability of the observation not being different to the null is less than 5% (i.e. p < 0.05), or the result should only be observed once or less for every 20 samples.

This is a subjective cut-off, but is generally accepted in the literature…

Difference between group means

You have greater statistical power when you have greater differences in means (effect size). P1 vs P3 has greater power than either P1 vs P2 or P2 vs P3.

Variability among subjects

Greater variability among subjects results in larger standard deviations, reducing our ability to distinguish among groups (i.e. statistical power).

Sample size

Increasing sample size increases statistical power by improving the estimate of the mean and constricting the distribution of the test statistic (i.e. reducing the standard error (SE)).

How do we do power analysis?

Simulate the data you would expect to collect in your study, varying the:

  • \(\alpha\) (“significance”) level (e.g. P < 0.05)
  • difference between group means (effect size)
  • variability among subjects
  • sample size (the factor we usually have most control over)

…and test for significant difference using the appropriate statistical test.

Let’s run through a few examples

Simulating data

First, we need to simulate some data.

If we believe our data are normally distributed, we can use the handy rnorm() function, like so:

dat <- rnorm(n = 50, # set the sample size
             mean = 1, # set the mean to = 1
             sd = 1) # set the standard deviation to = 1

Simulating data

Now let’s look at our new data

  • This is easier if we make it a data frame
df <- data.frame(Data = dat, Treatment = 1)

head(df)
        Data Treatment
1 0.12489733         1
2 2.29397240         1
3 0.72392121         1
4 1.30838083         1
5 0.60268929         1
6 0.05662562         1

Simulating data

We can plot it like so:

ggplot(df, aes(Data, fill = Treatment, colour = Treatment)) +
  geom_density(alpha = 0.1)

One sample t-test

Tests the hypothesis that the mean of our population is a specific value (e.g. 0).

t.test(x = df$Data, # set our vector of data values
       alternative = "two.sided", # specify the alternative hypothesis (which in this case is "not zero" so it is two-sided (verses "greater" or "less"))
       mu = 0) # set the "true value" of the mean 

    One Sample t-test

data:  df$Data
t = 7.4549, df = 49, p-value = 1.314e-09
alternative hypothesis: true mean is not equal to 0
95 percent confidence interval:
 0.6873307 1.1946444
sample estimates:
mean of x 
0.9409876 

In this case, the difference is highly significant! P < 0.000000000005!!!

One sample t-test

What if we fiddle with the \(\alpha\) (“significance” level)?

  • You usually wouldn’t do this!!!

but

  • With one-sample t-tests one effectively does when choosing your alternative hypothesis.
    • We set it to be “two-sided” because our alternative was that the mean is “not zero”. This means the result is only significantly different if the observed mean is in the upper or lower 2.5% of the distribution.
  • If our alternative hypothesis was that the observed mean was “greater” or “less” then we could set that and the result would only be significantly different if the observed mean is in either the upper or lower 5% of the distribution respectively.
    • i.e. setting the alternative to “greater” or “less” effectively makes the test more sensitive, similar to increasing the \(\alpha\)

One sample t-test

Now let’s fiddle with the difference between group means (effect size).

In this case this is easiest done by shifting the mu to closer to the mean of our randomly generated data, like so

t.test(x = df$Data, # set our vector of data values
       alternative = "two.sided", # specify the alternative hypothesis
       mu = 0.5) # set the "true value" of the mean 

    One Sample t-test

data:  df$Data
t = 3.4937, df = 49, p-value = 0.00102
alternative hypothesis: true mean is not equal to 0.5
95 percent confidence interval:
 0.6873307 1.1946444
sample estimates:
mean of x 
0.9409876 

Here we’ve reduced the effect size to from 1 to 0.5, but the result is still significantly different.

One sample t-test

Now let’s fiddle with variability among subjects.

# Make new data with greater variability (standard deviation = 2)
df <- data.frame(Data = 
                  rnorm(n = 50, # set the sample size
                        mean = 1, # set the mean
                        sd = 2), # set bigger standard deviation
                 Treatment = 1)

# Run t-test
t.test(x = df$Data,
       alternative = "two.sided", 
       mu = 0.5)

    One Sample t-test

data:  df$Data
t = 1.1985, df = 49, p-value = 0.2365
alternative hypothesis: true mean is not equal to 0.5
95 percent confidence interval:
 0.2341031 1.5516756
sample estimates:
mean of x 
0.8928893 

With double the variability (standard deviation), and an effect size of 0.5, the result is no longer significantly different…

One sample t-test

Now let’s increase the sample size.

# Make new data with greater sample size (n = 500)
df <- data.frame(Data = 
                  rnorm(n = 100, # set the sample size
                        mean = 1, # set the mean
                        sd = 2), # set bigger standard deviation
                 Treatment = 1)

# Run t-test
t.test(x = df$Data,
       alternative = "two.sided", 
       mu = 0.5)

    One Sample t-test

data:  df$Data
t = 2.2564, df = 99, p-value = 0.02624
alternative hypothesis: true mean is not equal to 0.5
95 percent confidence interval:
 0.5510605 1.2954907
sample estimates:
mean of x 
0.9232756 

Aha! Greater

Bullets

When you click the Render button a document will be generated that includes:

  • Content authored with markdown
  • Output from executable code

Code

When you click the Render button a presentation will be generated that includes both content and the output of embedded code. You can embed code like this:

[1] 2